Metric and trigonometric pruning for clustering of uncertain data in 2D geometric space

نویسندگان

  • Wang Kay Ngai
  • Ben Kao
  • Reynold Cheng
  • Michael Chau
  • Sau Dan Lee
  • David Wai-Lok Cheung
  • Kevin Y. Yip
چکیده

We study the problem of clustering data objects with location uncertainty. In our model, a data object is represented by an uncertainty region over which a probability density function (pdf) is defined. One method to cluster such uncertain objects is to apply the UK-means algorithm [1], an extension of the traditional K-means algorithm, which assigns each object to the cluster whose representative has the smallest expected distance from it. For arbitrary pdf, calculating the expected distance between an object and a cluster representative requires expensive integration of the pdf. We study two pruning methods: pre-computation (PC) and cluster shift (CS) that can significantly reduce the number of integrations computed. Both pruning methods rely on good bounding techniques. We propose and evaluate two such techniques that are based on metric properties (Met) and trigonometry (Tri). Our experimental results show that Tri offers a very high pruning power. In some cases, more than 99.9% of the expected distance calculations are pruned. This results in a very efficient clustering algorithm. ∗Corresponding author Email addresses: [email protected] (Wang Kay Ngai), [email protected] (Ben Kao), [email protected] (Reynold Cheng), [email protected] (Michael Chau), [email protected] (Sau Dan Lee), [email protected] (David W. Cheung), [email protected] (Kevin Y. Yip) Part of this paper appears in Ngai et al., 2006 [2], in which the algorithms PC and Preprint submitted to Information Systems August 19, 2010

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Target detection Bridge Modelling using Point Cloud Segmentation Obtained from Photogrameric UAV

In recent years, great efforts have been made to generate 3D models of urban structures in photogrammetry and remote sensing. 3D reconstruction of the bridge, as one of the most important urban structures in transportation systems, has been neglected because of its geometric and structural complexity. Due to the UAV technology development in spatial data acquisition, in this study, the point cl...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Using an Imperialistic Competitive Algorithm in Global Polynomials Optimization (Case Study: 2D Geometric Correction of IKONOS and SPOT Imagery)

The number of high resolution space imageries in photogrammetry and remote sensing society is growing fast. Although these images provide rich data, the lack of sensor calibration information and ephemeris data does not allow the users to apply precise physical models to establish the functional relationship between image space and object space. As an alternative solution, some generalized mode...

متن کامل

Geometric Modeling of Dubins Airplane Movement and its Metric

The time-optimal trajectory for an airplane from some starting point to some final point is studied by many authors. Here, we consider the extension of robot planer motion of Dubins model in three dimensional spaces. In this model, the system has independent bounded control over both the altitude velocity and the turning rate of airplane movement in a non-obstacle space. Here, in this paper a g...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Syst.

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2011